

**BEST AVAILABLE COPY**

**EP0638858**

**Publication Title:**

**Pipeline data processing apparatus having small power consumption**

**Abstract:**

In a pipeline processing apparatus including a plurality of serially connected stages (ST1, ST2,...), a plurality of clock signals (CLK1, CLK2,...) are supplied to the stages individually. The clock signals can be individually stopped.

---

**Data supplied from the esp@cenet database - <http://ep.espacenet.com>**



Europäisches Patentamt  
European Patent Office  
Office européen des brevets



⑪ Publication number:

0 638 858 A1

⑫

## EUROPEAN PATENT APPLICATION

⑬ Application number: 94112140.2

⑮ Int. Cl. 8: G06F 1/32, G06F 9/38

⑭ Date of filing: 03.08.94

⑯ Priority: 03.08.93 JP 210932/93

⑰ Applicant: NEC CORPORATION

⑯ Date of publication of application:  
15.02.95 Bulletin 95/07

7-1, Shiba 5-chome  
Minato-ku  
Tokyo (JP)

⑯ Designated Contracting States:  
DE FR GB

⑰ Inventor: Nakayama, Takashi, c/o NEC  
Corporation  
7-1 Shiba 5-chome,  
Minato-ku  
Tokyo (JP)

⑯ Representative: Betten & Resch  
Reichenbachstrasse 19  
D-80469 München (DE)

⑯ Pipeline data processing apparatus having small power consumption.

⑯ In a pipeline processing apparatus including a plurality of serially connected stages (ST<sub>1</sub>, ST<sub>2</sub>, ...), a plurality of clock signals (CLK<sub>1</sub>, CLK<sub>2</sub>, ...) are supplied to the stages individually. The clock signals can be individually stopped.

Fig. 4



EP 0 638 858 A1

## BACKGROUND OF THE INVENTION

## Field of the Invention

5 The present invention relates to a pipeline processing apparatus including a plurality of serially-connected stages, each stage having a plurality of flip-flops and a logic gate combination circuit.

## Description of the Related Art

10 A microprocessor such as a pipeline processing apparatus includes a plurality of stages each having a plurality of flip-flops and a logic gate combination circuit. About half of the entire power consumption is dissipated in the flip-flops. Also, about half of that half is dissipated in a clock driving circuit for driving a clock signal supplied to the flip-flops, and the remainder of that half is dissipated in the flip-flops per se and their outputs.

15 Generally, in a complementary metal oxide semiconductor (CMOS) large scale integrated circuit (LSI), power consumption is mainly dependent upon dynamic power consumption caused by charging and discharging operations performed upon a load capacity, and can be represented by (see Neil Weste et al, "PRINCIPLES OF CMOS VLSI DESIGN", pp. 144-149, 1985)

20  $P = C_L V_{DD}^2 f_p$  (1)

where

P is a power consumption;

$C_L$  is a load capacity;

25  $V_{DD}$  is a power supply voltage; and

$f_p$  is the frequency of a signal. If the signal is a clock signal whose frequency is  $f_c$ , then  $f_p = f_c$ . If the signal is an output signal of a flip-flop, then  $f_p \approx 1/4 f_c$  in view of the probability of transition of the output signal from high to low and vice versa.

In the pipeline processing apparatus, however, the output signals of the flip-flops are not always 30 changed from high to low or vice versa in accordance with the clock signal. Each output signal of the flip-flops may be changed once for ten clock signals on the average, and in this case,  $f_p \approx 1/10 f_c$ . This means about 90 % of the power consumption dissipated in the clock driver circuit can be wasted. This will be explained later in detail.

## 35 SUMMARY OF THE INVENTION

It is an object of the present invention to reduce the power consumption of a pipeline processing apparatus including a plurality of serially-connected stages each having at least a plurality of flip-flops.

According to the present invention, in a pipeline processing apparatus including a plurality of serially 40 connected stages, a plurality of clock signals are supplied to the stages individually. The clock signals can be individually stopped, so that wasteful transitions of the clock signals are reduced.

## BRIEF DESCRIPTION OF THE DRAWINGS

45 The present invention will be more clearly understood from the description as set forth below, in comparison with the prior art, with reference to the accompanying drawings, wherein:

Fig. 1 is a circuit diagram illustrating a prior art pipeline processing apparatus;

Figs. 2A through 2F are timing diagrams showing the operation of the circuit of Fig. 1;

Figs. 3A and 3B are circuit diagrams illustrating examples of flip-flops;

50 Fig. 4 is a circuit diagram illustrating a first embodiment of the pipeline processing apparatus according to the present invention;

Figs. 5A through 5K are timing diagrams showing the operation of the circuit of Fig. 4;

Fig. 6 is a circuit diagram illustrating a second embodiment of the pipeline processing apparatus according to the present invention;

55 Figs. 7A through 7K are timing diagrams showing the operation of the circuit of Fig. 6; and

Fig. 8 is a circuit diagram illustrating a third embodiment of the pipeline processing apparatus according to the present invention.

## DESCRIPTION OF THE PREFERRED EMBODIMENT

Before the description of the preferred embodiment, a prior art pipeline processing apparatus will be explained with reference to Figs. 1, 2A through 2F, and 3A and 3B.

5 In Fig. 1, which illustrates a prior art pipeline processing apparatus, a plurality of stages ST<sub>1</sub>, ST<sub>2</sub>, ... are provided. The first stage ST<sub>1</sub> is comprised of flip-flops FF<sub>11</sub>, FF<sub>12</sub>, ... for receiving data DA<sub>1</sub>, DA<sub>2</sub>, ... , respectively, and a logic gate combination circuit C<sub>1</sub> for receiving the output data DB<sub>1</sub>, DB<sub>2</sub>, ... of the flip-flops FF<sub>11</sub>, FF<sub>12</sub>, ... . In this case, the flip-flops FF<sub>11</sub>, FF<sub>12</sub>, ... are of a D-type which operate in response to their input rise edges. Note that the logic gate combination circuit C<sub>1</sub> is comprised of logic gates such as AND circuits, NAND circuits, OR circuits and NOR circuits, but includes no flip-flop and no latch circuit. Also, the data DA<sub>1</sub>, DA<sub>2</sub>, ... are supplied to the first stage ST<sub>1</sub> through selectors SEL<sub>1</sub>, SEL<sub>2</sub>, ... which are controlled by a stall signal STL for stopping the pipeline operation of the pipeline processing apparatus. That is, when the stall signal STL is low (= "0"), the data DA<sub>1</sub>, DA<sub>2</sub>, ... are supplied to the flip-flops FF<sub>11</sub>, FF<sub>12</sub>, ... , so that the contents of the flip-flops FF<sub>11</sub>, FF<sub>12</sub>, ... are changed. On the other hand, when the stall signal STL is high (= "1"), the output signals of the flip-flops FF<sub>11</sub>, FF<sub>12</sub>, ... are fed back to the inputs thereof, so that the contents of the flip-flops FF<sub>11</sub>, FF<sub>12</sub>, ... are not changed.

Also, the second stage ST<sub>2</sub> is comprised of flip-flops FF<sub>21</sub>, FF<sub>22</sub>, ... , and a logic gate combination circuit C<sub>2</sub> for receiving the output data DC<sub>1</sub>, DC<sub>2</sub>, ... of the flip-flops FF<sub>21</sub>, FF<sub>22</sub>, ... . The third stage ST<sub>3</sub> and its post-stages have the same configuration as the second stage ST<sub>2</sub>.

20 Further, a clock signal CLK is supplied to all the flip-flops FF<sub>11</sub>, FF<sub>12</sub>, ... , FF<sub>21</sub>, FF<sub>22</sub>, ... , FF<sub>31</sub>, FF<sub>32</sub>, ... , and therefore, the flip-flops FF<sub>11</sub>, FF<sub>12</sub>, ... , FF<sub>21</sub>, FF<sub>22</sub>, ... , FF<sub>31</sub>, FF<sub>32</sub>, ... are simultaneously operated.

The operation of the pipeline processing apparatus of Fig. 1 will now be explained with reference to Figs. 2A through 2F.

25 As shown in Figs. 2A and 2B, the clock signal CLK and the data DA (DA<sub>1</sub>, DA<sub>2</sub>, ...) are always generated. In this state, as shown in Fig. 2C, the stall signal STL is "0" at rise-edge timings t<sub>0</sub>, t<sub>1</sub>, t<sub>4</sub> and t<sub>5</sub> of the clock signal CLK, so that the selectors SEL<sub>1</sub>, SEL<sub>2</sub>, ... select the data DA. As a result, the output data DB (DB<sub>1</sub>, DB<sub>2</sub>, ...) of the flip-flops FF<sub>11</sub>, FF<sub>12</sub>, ... are data obtained by delaying the data DA by one clock time period  $\Delta T$ , as shown in Fig. 2D. On the other hand, as shown in Fig. 2C, the stall signal STL is "0" at rise edge timings t<sub>2</sub>, t<sub>3</sub> and t<sub>6</sub> of the clock signal CLK, so that the output data DB of the flip-flops FF<sub>11</sub>, FF<sub>12</sub>, ... are not changed as shown in Fig. 2D. Also, the second stage ST<sub>2</sub> and its post stages always receive the clock signal CLK, and therefore, the operation results of the logic gate combination circuits C<sub>1</sub>, C<sub>2</sub>, ... based upon the outputs of their prestage flip-flops are written into the flip-flops of the second stage ST<sub>2</sub> and its post stages, as shown in Figs. 2E and 2F.

35 In the pipeline processing apparatus of Fig. 1, however, even during time periods where the contents of the flip-flops are not changed due to the stall signal STL, the flip-flops receive the clock signal CLK so as to operate them (see t<sub>2</sub> and t<sub>3</sub> of Fig. 2D; t<sub>3</sub> and t<sub>4</sub> of Fig. 2E and t<sub>4</sub> and t<sub>5</sub> of Fig. 2F). This increases the power consumption.

For example, if each of the flip-flops is of a static type as illustrated in Fig. 3A, the power consumption 40 of the pipeline processing apparatus of Fig. 1 is calculated below. Here, the following conditions are assumed:

- an input capacity of the clock signal CLK to each flip-flop = 0.06 pF;
- an internal load capacity of each flip-flop = 0.07 pF;
- an input capacity of the stall signal STL to each of the selectors SEL<sub>1</sub>, SEL<sub>2</sub>, ... = 0.04 pF;
- 45 an internal load capacity of each of the selectors SEL<sub>1</sub>, SEL<sub>2</sub>, ... = 0.05 pF;
- the number of the flip-flops FF<sub>11</sub>, FF<sub>12</sub>, ... = 40;
- the number of the flip-flops FF<sub>21</sub>, FF<sub>22</sub>, ... = 20;
- the number of the flip-flops FF<sub>31</sub>, FF<sub>32</sub>, ... = 30;
- 50 an internal load capacity of the logic gate combination circuit C<sub>1</sub> = 20 pF;
- an internal load capacity of the logic gate combination circuit C<sub>2</sub> = 10 pF;
- V<sub>DD</sub> = 5V;
- the frequency f<sub>c</sub> of the clock signal CLK = 50 MHz;
- the probability of "1" within the stall signal STL = 2/5, i.e., the frequency of the stall signal STL = 2/5 • 50 MHz; and
- 55 the frequency of other logic signals = f<sub>s</sub>/4 (due to the fact that the probability of transition of the output signal of each flip-flop at unstalled timings where the output signal is expected to change is 1/2, i.e., the transition frequency is 1/4 f<sub>s</sub>).

Therefore, from the equation (1),

$$\begin{aligned}
 P = & (40 + 20 + 30) \cdot 0.06 \cdot 10^{-12} \times 5^5 \times 50 \cdot 10^6 \\
 & + (40 + 20 + 30) \cdot 0.07 \cdot 10^{-12} \times 5^2 \times 1/4 \cdot 2/5 \cdot 50 \cdot 10^6 \\
 5 & + 40 \cdot 0.04 \cdot 10^{-12} \times 5^2 \times 2/5 \cdot 50 \cdot 10^6 \\
 & + 40 \cdot 0.05 \cdot 10^{-12} \times 5^2 \times 1/4 \cdot 2/5 \cdot 50 \cdot 10^6 \\
 & + (20 + 10) \cdot 10^{-12} \times 5^2 \times 1/4 \cdot 2/5 \cdot 50 \cdot 10^6 \quad (2)
 \end{aligned}$$

where the first term is a power consumption dissipated by the clock signal CLK; the second term is a power consumption dissipated within the flip-flops; the third term is a power consumption dissipated by the stall signal STL; the fourth term is a power consumption dissipated in the selectors SEL<sub>1</sub>, SEL<sub>2</sub>, ...; and the fifth term is a power consumption dissipated in the logic gate combination circuits C<sub>1</sub> and C<sub>2</sub>. The equation (2) is represented by

16

$$\begin{aligned}
 P = & (6.75 + 0.79 + 0.80 + 0.25 + 3.75) \cdot 10^{-3} \\
 = & 12.34 \text{mW} \quad (3)
 \end{aligned}$$

20 Thus, 55 percent of the entire power consumption (12.34 mW) is dissipated by the clock signal CLK, and 2/5 of this power consumption (22 percent of the entire power consumption) is dissipated when the pipeline processing apparatus is stalled. Also, 9 percent (0.80 + 0.25 mW) of the entire power consumption is dissipated in the selectors SEL<sub>1</sub>, SEL<sub>2</sub>, ... In other words, 31 percent of the entire power consumption does not contribute to the pipeline operation of the pipeline processing apparatus, and therefore, the 31 percent of the entire power consumption is wasteful.

25 On the other hand, if each of the flip-flops is of a dynamic type as illustrated in Fig. 3B, since the number of transistors is reduced as compared with the static type flip-flop as illustrated in Fig. 3B, the input capacity of the clock signal CLK to each flip-flop is decreased from 0.06 pF to 0.03 pF, and the internal load capacity of each flip-flop is decreased from 0.07 pF to 0.04 pF. Therefore, the first term of the equation (2) is replaced by

$$(40 + 20 + 30) \cdot 0.03 \cdot 10^{-12} \times 5^2 \times 50 \cdot 10^6$$

and the second term of the equation (2) is replaced by

35

$$(40 + 20 + 30) \cdot 0.04 \cdot 10^{-12} \times 5^2 \times 1/4 \cdot 2/5 \cdot 50 \cdot 10^6$$

Therefore, in this case, the entire power consumption p is represented by

40

$$\begin{aligned}
 P = & (3.38 + 0.45 + 0.80 + 0.25 + 3.75) \cdot 10^{-3} \\
 = & 8.63 \text{mW} \quad (4)
 \end{aligned}$$

45 Thus, 39 percent of the entire power consumption (8.63 mW) is dissipated by the clock signal CLK, and 2/5 of this power consumption (16 percent of the entire power consumption) is dissipated when the pipeline processing apparatus is stalled. Also, 12 percent (0.80 + 0.25 mW) of the entire power consumption is dissipated in the selectors SEL<sub>1</sub>, SEL<sub>2</sub>, ... In other words, 28 percent of the entire power consumption does not contribute to the pipeline operation of the pipeline processing apparatus, and therefore, the 28 percent of the entire power consumption is wasteful.

50 In Fig. 4, which illustrates a first embodiment of the present invention using static type flip-flops as illustrated in Fig. 3A, flip-flops FF<sub>10</sub>, FF<sub>20</sub>, ... and OR circuits G<sub>1</sub>, G<sub>2</sub>, G<sub>3</sub>, ... are added to the elements of Fig. 1, and the selectors SEL<sub>1</sub>, SEL<sub>2</sub>, ... of Fig. 1 are deleted. Each of the flip-flops FF<sub>10</sub>, FF<sub>20</sub>, ... delays stall signals STL1 (=STL), STL2, ... by one clock time period  $\Delta T$ . The OR circuits G<sub>1</sub>, G<sub>2</sub>, G<sub>3</sub>, ... turn ON and OFF the clock signal in accordance with the stall signals STL1, STL2, STL3, ... Thus, the stall signals STL1, STL2, STL3, ... having one clock time period  $\Delta T$  therebetween are generated. As a result, the flip-flops FF<sub>11</sub>, FF<sub>12</sub>, ... are operated in accordance with a clock signal CLK1 which is an OR logic between the clock signal CLK and the stall signal STL1, the flip-flops FF<sub>21</sub>, FF<sub>22</sub>, ... are operated in

accordance with a clock signal CLK2 which is an OR logic between the clock signal CLK and the stall signal STL2, and the flip-flops FF<sub>31</sub>, FF<sub>32</sub>, ... are operated in accordance with a clock signal CLK3 which is an OR logic between the clock signal CLK and the stall signal STL3.

5 The operation of the pipeline processing apparatus of Fig. 4 will now be explained with reference to Figs. 5A through 5K.

As shown in Figs. 5A and 5B, the clock signal CLK and the data DA (DA<sub>1</sub>, DA<sub>2</sub>, ...) are also always generated. In this state, as shown in Fig. 5C, the stall signal STL1 is "0" at rising-edge timings t<sub>0</sub>, t<sub>1</sub>, t<sub>4</sub> and t<sub>5</sub> of the clock signal CLK and is "1" at rising edge timings t<sub>2</sub>, t<sub>3</sub> and t<sub>6</sub> of the clock signal CLK. As a result, the clock signal CLK1 for the flip-flops FF<sub>11</sub>, FF<sub>12</sub>, ... rises at only timings t<sub>0</sub>, t<sub>1</sub>, t<sub>4</sub> and t<sub>5</sub>, as shown in Fig. 5D. Therefore, the data DB (DB<sub>1</sub>, DB<sub>2</sub>, ...) of the flip-flops FF<sub>11</sub>, FF<sub>12</sub>, ... are changed at only timings t<sub>0</sub>, t<sub>1</sub>, t<sub>4</sub> and t<sub>5</sub>, as shown in Fig. 5E, and therefore, the output data DB are obtained by delaying the data DA by one clock time period  $\Delta T$ .

Also, as shown in Fig. 5F, the stall signal STL2 is delayed as compared with the stall signal STL1 by one clock time period  $\Delta T$ . Therefore, as shown in Fig. 5F, the stall signal STL2 is "0" at rising-edge timings t<sub>0</sub>, t<sub>1</sub>, t<sub>2</sub>, t<sub>5</sub> and t<sub>6</sub> of the clock signal CLK and is "1" at rising edge timings t<sub>3</sub> and t<sub>4</sub> of the clock signal CLK. As a result, the clock signal CLK2 for the flip-flops FF<sub>21</sub>, FF<sub>22</sub>, ... rises at only timings t<sub>0</sub>, t<sub>1</sub>, t<sub>2</sub>, t<sub>5</sub> and t<sub>6</sub>, as shown in Fig. 5G. Therefore, the data DC (DC<sub>1</sub>, DC<sub>2</sub>, ...) of the flip-flops FF<sub>21</sub>, FF<sub>22</sub>, ... are changed at only timings t<sub>0</sub>, t<sub>1</sub>, t<sub>2</sub>, t<sub>5</sub> and t<sub>6</sub>, as shown in Fig. 5H, and therefore, the output data DC are obtained by delaying the data DB by one clock time period  $\Delta T$ .

20 Also, as shown in Fig. 5I, the stall signal STL3 is delayed as compared with the stall signal STL2 by one clock time period  $\Delta T$ . Therefore, as shown in Fig. 5I, the stall signal STL3 is "0" at rising-edge timings t<sub>0</sub>, t<sub>1</sub>, t<sub>2</sub>, t<sub>3</sub> and t<sub>6</sub> of the clock signal CLK and is "1" at rising edge timings t<sub>4</sub> and t<sub>5</sub> of the clock signal CLK. As a result, the clock signal CLK3 for the flip-flops FF<sub>31</sub>, FF<sub>32</sub>, ... rises at only timings t<sub>0</sub>, t<sub>1</sub>, t<sub>2</sub>, t<sub>3</sub> and t<sub>6</sub>, as shown in Fig. 5J. Therefore, the data DD (DD<sub>1</sub>, DD<sub>2</sub>, ...) of the flip-flops FF<sub>31</sub>, FF<sub>32</sub>, ... are changed at only timings t<sub>0</sub>, t<sub>1</sub>, t<sub>2</sub>, t<sub>3</sub> and t<sub>6</sub>, as shown in Fig. 5K, and therefore, the output data DD are obtained by delaying the data DC by one clock time period  $\Delta T$ .

Thus, according to the first embodiment, during a stall period where the change of the flip-flops FF<sub>11</sub>, FF<sub>12</sub>, ..., FF<sub>21</sub>, FF<sub>22</sub>, ..., FF<sub>31</sub>, FF<sub>32</sub>, ... is unnecessary, the generation of the clock signals CLK1, CLK2, CLK3, ... is stopped, thus reducing the power consumption.

30 An actual power consumption of the pipeline processing apparatus of Fig. 4 will be explained as compared with that of the pipeline processing apparatus of Fig. 1. The power consumption of the pipeline processing apparatus of Fig. 4 is calculated below. Here, the following conditions are assumed:

- an input capacity of the clock signal CLK to each flip-flop = 0.06 pF;
- an internal load capacity of each flip-flop = 0.07 pF;
- 35 an input capacity of each of the OR circuits G<sub>1</sub>, G<sub>2</sub>, ... = 0.10 pF;
- an internal load capacity of each of the OR circuits G<sub>1</sub>, G<sub>2</sub>, ... = 0.50 pF;
- the number of the flip-flops FF<sub>11</sub>, FF<sub>12</sub>, ... = 40;
- the number of the flip-flops FF<sub>21</sub>, FF<sub>22</sub>, ... = 20;
- the number of the flip-flops FF<sub>31</sub>, FF<sub>32</sub>, ... = 30;
- 40 an internal load capacity of the logic gate combination circuit C<sub>1</sub> = 20 pF;
- an internal load capacity of the logic gate combination circuit C<sub>2</sub> = 10 pF;
- V<sub>DD</sub> = 5V;
- the frequency f<sub>c</sub> of the clock signal CLK = 50 MHz;
- the probability of "1" within the stall signal STL = 2/5, i.e., the frequency of the stall signal STL = 2/5
- 45 • 50 MHz; and
- the frequency of other logic signals = f<sub>s</sub>/4.

Therefore, from the equation (1),

$$\begin{aligned}
 P = & (2 \cdot 0.06 + 3 \cdot 0.10) \cdot 10^{-12} \times 5^2 \times 50 \cdot 10^6 \\
 50 & + (3 \cdot 0.50 + (40+20+30) \cdot 0.06) \cdot 10^{-12} \times 5^2 \times 2/5 \cdot 50 \cdot 10^6 \\
 & + (40+20+30) \cdot 0.07 \cdot 10^{-12} \times 5^2 \times 1/4 \cdot 2/5 \cdot 50 \cdot 10^6 \\
 & + (2 \cdot 0.07 + 3 \cdot 0.10) \cdot 10^{-12} \times 5^2 \times 2/5 \cdot 50 \cdot 10^6 \\
 & + (20+10) \cdot 10^{-12} \times 5^2 \times 1/4 \cdot 2/5 \cdot 50 \cdot 10^6 \quad (5)
 \end{aligned}$$

55 where the first term is a power consumption dissipated by the clock signal CLK; the second term is a power consumption dissipated in the OR circuits G<sub>1</sub>, G<sub>2</sub> and G<sub>3</sub> and by the clock signals CLK1, CLK2 and CLK3; the third term is a power consumption dissipated within the flip-flops; the fourth term is a power consumption dissipated by the stall signal STL; and the fifth term is a power consumption dissipated in the

logic gate combination circuits  $C_1$  and  $C_2$ . The equation (5) is represented by

$$P = (0.53+3.45+0.79+0.25+3.75) \cdot 10^{-3}$$

$$= 8.74 \text{ mW} \quad (6)$$

Thus, the entire power consumption (8.74 mW) in the first embodiment, validating all the rise edges of the clock signals CLK1, CLK2 and CLM3 can be reduced by 29 percent as compared with that (12.34 mW) of the prior art pipeline processing apparatus of Fig. 1. This is mainly because the power consumption by the clock signals is reduced by 42 percent as compared with the prior art pipeline processing apparatus of Fig. 1. Note that the reduction in power consumption by deleting the sectors  $SEL_1$ ,  $SEL_2$ , ... and the increase in power consumption by delaying the stall signal STL cancel each other.

In Fig. 6, which illustrates a second embodiment of the present invention using dynamic type flip-flops as illustrated in Fig. 3B, an inverter  $G_{11}$ , an AND circuit  $G_{12}$ , selectors  $SEL_1$ ,  $SEL_2$ , ... are added to the elements of Fig. 5, to thereby carry out a refresh operation by a refresh signal REF. The refresh signal REF is a clock pulse signal which is made high (= "1") for every definite time period such as several  $\mu$ s.

The operation of the pipeline processing apparatus of Fig. 6 will now be explained with reference to Figs. 7A through 7K. When the refresh signal REF is "0", the operation of the pipeline processing apparatus of Fig. 6 is the same as that of the pipeline processing apparatus of Fig. 4 (see non-refresh mode of Figs. 7A through 7K). That is, the output signal of the inverter  $G_{11}$  is "1", so that the AND circuit  $G_{12}$  passes the stall signal STL therethrough. In this case,  $STL_1 = STL$ . Simultaneously, the selectors  $SEL_1$ ,  $SEL_2$ , ... select the data  $DA_1$ ,  $DA_2$ , ..., respectively.

When the stall signal  $STL_1$  (=  $STL$ ) is "1" and accordingly, the stall signals  $STL_2$  and  $STL_3$  are "1", the refresh signal REF is made "1", as shown in Fig. 7D, so that the control enters a refresh mode. That is, the selectors  $SEL_1$ ,  $SEL_2$ , ... select the outputs  $DB_1$ ,  $DB_2$ , ... (= DB), respectively, of the flip-flops  $FF_{11}$ ,  $FF_{12}$ , ... Also, since the refresh signal REF (= "1") is inverted by the inverter  $G_{11}$  and is supplied to the AND circuit  $G_{12}$ , the output signal of the AND circuit  $G_{12}$  is "0" regardless of the stall signal STL. That is, the stall signal  $STL_1$  is "0" as shown in Fig. 7E. As a result, the clock signal CLK passes through the OR circuit  $G_1$  only for a time period defined by the stall signal  $STL_1$  (= "0"), so that the clock signal CLK1 is formed by the clock signal CLK, as shown in Fig. 7F. Therefore, the outputs  $DB_1$ ,  $DB_2$ , ... (= DB) of the flip-flops  $FF_{11}$ ,  $FF_{12}$ , ... are again written thereinto, thus refreshing the flip-flops  $FF_{11}$ ,  $FF_{12}$ , ...

Also, the stall signal  $STL_1$  is latched by the flip-flop  $FF_{10}$ , and as a result, the stall signal  $STL_2$  is "0" for one clock time period  $\Delta T$ , as shown in Fig. 7H. Therefore, the clock signal CLK passes through the OR circuit  $G_2$  only for a time period defined by the stall signal  $STL_2$  (= "0"), so that the clock signal CLK2 is formed by the clock signal CLK, as shown in Fig. 7I. Thus, the outputs  $DC_1$ ,  $DC_2$ , ... (= DC) of the flip-flops  $FF_{21}$ ,  $FF_{22}$ , ... are again written thereinto, thus refreshing the flip-flops  $FF_{21}$ ,  $FF_{22}$ , ...

Further, the stall signal  $STL_2$  is latched by the flip-flop  $FF_{20}$ , and as a result, the stall signal  $STL_3$  is "0" for one clock time period  $\Delta T$ , as shown in Fig. 7K. Therefore, the clock signal CLK passes through the OR circuit  $G_3$  only for a time period defined by the stall signal  $STL_3$  (= "0"), so that the clock signal CLK3 is formed by the clock signal CLK, as shown in Fig. 7L. Thus, the outputs  $DD_1$ ,  $DD_2$ , ... (= DD) of the flip-flops  $FF_{31}$ ,  $FF_{32}$ , ... are again written thereinto, thus refreshing the flip-flops  $FF_{31}$ ,  $FF_{32}$ , ...

Note that the provision of the selectors  $SEL_1$ ,  $SEL_2$ , ... at the prestages of the flip-flops  $FF_{11}$ ,  $FF_{12}$ , ... makes it definitely possible to carry out a refresh operation upon the flip-flops  $FF_{11}$ ,  $FF_{12}$ , ... even when the data DA ( $DA_1$ ,  $DA_2$ , ...) are changed during a refresh mode. Contrary to this, during a stalling period, the input values of the flip-flops  $FF_{21}$ ,  $FF_{22}$ , ...,  $FF_{31}$ ,  $FF_{32}$ , ... are not changed, and therefore, no selectors are provided at the prestages of the flip-flops  $FF_{21}$ ,  $FF_{22}$ , ...,  $FF_{31}$ ,  $FF_{32}$ , ...

Thus, according to the second embodiment, during a stall period where the change of the flip-flops  $FF_{11}$ ,  $FF_{12}$ , ...,  $FF_{21}$ ,  $FF_{22}$ , ...,  $FF_{31}$ ,  $FF_{32}$ , ... is unnecessary, the generation of the clock signals CLK1, CLK2, CLK3, ... is stopped, thus reducing the power consumption.

The actual power consumption of the pipeline processing apparatus of Fig. 6 will be explained as compared with that of the pipeline processing apparatus of Fig. 1. The power consumption of the pipeline processing apparatus of Fig. 6 is calculated below. Here, the following conditions are assumed:

- an input capacity of the clock signal CLK to each flip-flop = 0.03 pF;
- an internal load capacity of each flip-flop = 0.04 pF;
- an input capacity of the refresh signal REF to each of the selectors  $SEL_1$ ,  $SEL_2$ , ... = 0.05 pF;
- an internal load capacity of each of the selectors  $SEL_1$ ,  $SEL_2$ , ... = 0.05 pF;
- an input capacity of each of the OR circuits  $G_1$ ,  $G_2$ , ... = 0.10 pF;

an internal load capacity of each of the OR circuits  $G_1, G_2, \dots = 0.50 \text{ pF}$ ;  
 the number of the flip-flops  $FF_{11}, FF_{12}, \dots = 40$ ;  
 the number of the flip-flops  $FF_{21}, FF_{22}, \dots = 20$ ;  
 the number of the flip-flops  $FF_{31}, FF_{32}, \dots = 30$ ;  
 5 an internal load capacity of the logic gate combination circuit  $C_1 = 20 \text{ pF}$ ;  
 an internal load capacity of the logic gate combination circuit  $C_2 = 10 \text{ pF}$ ;  
 $V_{DD} = 5V$ ;  
 the frequency  $f_c$  of the clock signal  $CLK = 50 \text{ MHz}$ ;  
 the probability of "1" within the stall signal  $STL = 2/5$ , i.e., the frequency of the stall signal  $STL = 2/5$   
 10 •  $50\text{MHz}$ ; and  
 the frequency of other logic signals  $= f_c/4$ .  
 Therefore, from the equation (1),

$$\begin{aligned}
 P = & (2 \cdot 0.03 + 3 \cdot 0.10) \cdot 10^{-12} \times 5^2 \times 50 \cdot 10^6 \\
 15 & + (3 \cdot 0.50 + (40 + 20 + 30) \cdot 0.03) \cdot 10^{-12} \times 5^2 \times (2/5 + 1/200) \cdot 50 \cdot 10^6 \\
 & + (40 + 20 + 30) \cdot 0.04 \cdot 10^{-12} \times 5^2 \times 1/4 \cdot 2/5 \cdot 50 \cdot 10^6 \\
 & + (2 \cdot 0.04 + 3 \cdot 0.10) \cdot 10^{-12} \times 5^2 \times (2/5 + 1/200) \cdot 50 \cdot 10^6 \\
 & + (20 + 10) \cdot 10^{-12} \times 5^2 \times 1/4 \cdot 2/5 \cdot 50 \cdot 10^6 \\
 & + 40 \cdot 0.04 \cdot 10^{-12} \times 5^2 \times 1/200 \cdot 50 \cdot 10^6 \\
 20 & + 40 \cdot 0.05 \cdot 10^{-12} \times 5^2 \times 1/4 \cdot (2/5 + 1/200) \cdot 50 \cdot 10^6 \quad (7)
 \end{aligned}$$

where the first term is a power consumption dissipated by the clock signal  $CLK$ ; the second term is a power consumption dissipated in the OR circuits  $G_1, G_2$  and  $G_3$  and by the clock signals  $CLK1, CLK2$  and  $CLK3$ ; the third term is a power consumption dissipated within the flip-flops; the fourth term is a power consumption dissipated by the stall signal  $STL$ ; and the fifth term is a power consumption dissipated in the logic gate combination circuits  $C_1$  and  $C_2$ ; the sixth term is a power consumption dissipated in the selectors  $SEL_1', SEL_2', \dots$ . The equation (7) is represented by

$$\begin{aligned}
 30 P & = (0.45 + 1.44 + 0.45 + 0.19 + 3.75 + 0.01 + 0.25) \cdot 10^{-3} \\
 & = 6.54 \text{ mW} \quad (8)
 \end{aligned}$$

Thus, the entire power consumption (6.54 nW) in the second embodiment, validating all the rise edges of the clock signals  $CLK1, CLK2$  and  $CLK3$  can be reduced by 24 percent as compared with that (8.63 mW) of the prior art pipeline processing apparatus of Fig. 1. This is mainly because the power consumption by the clock signals is reduced by 42 percent as compared with the prior art pipeline processing apparatus of Fig. 1.

In the above-mentioned second embodiment as illustrated in Fig. 6, the logic gate combination circuits  $C_1$  and  $C_2$  can be of a dynamic type. In this case, the clock signals  $CLK2$  and  $CLK3$  are supplied to the logic gate combination circuits  $C_1$  and  $C_2$ , respectively, as indicated by dotted arrows in Fig. 6. That is, when the clock signal  $CLK2$  ( $CLK3$ ) is high, the logic gate combination circuit  $C_1$  ( $C_2$ ) is precharged, while when the clock signal  $CLK2$  ( $CLK3$ ) is low, the logic gate combination  $C_1$  ( $C_2$ ) carries out a logic operation. Thus, since the clock signal  $CLK2$  ( $CLK3$ ) masked by the stall signals  $STL2$  ( $STL3$ ) is supplied to the logic gate combination circuit  $C_1$  ( $C_2$ ), the power consumption dissipated by the clock signals can be reduced. In the prior art pipeline processing apparatus of Fig. 1, the clock signal  $CLK$  which has more transitions than the clock signals  $CLK2$  and  $CLK3$  may be supplied directly to the logic gate combination circuits  $C_1$  and  $C_2$  which are of a dynamic type, thus increasing the power consumption.

In Fig. 8, which illustrates a third embodiment of the present invention, flip-flops  $FF_{11}', FF_{12}', \dots$  and a logic gate combination circuit  $C_1'$  are connected in parallel to the flip-flops  $FF_{11}, FF_{12}, \dots$  and the logic gate combination  $C_1$  of Fig. 4. In other words, one first stage consists double-stages designated by sub stages  $ST_1$  and  $ST_1'$ . The sub stages  $ST_1$  and  $ST_1'$  are switched by OR circuits  $G_1$  and  $G_1'$  and selectors  $SEL_1''$  and  $SEL_2''$  which are controlled by a decoding signal  $DEC$ .

In order to operate either the logic gate combination circuit  $C_1$  or the logic gate combination circuit  $C_1'$ , either the clock signal  $CLK1$  or the clock signal  $CLK1'$  is generated. For example, when the decoding signal  $DEC$  is "1", the clock signal  $CLK1$  is clocked, while when the decoding signal  $DEC$  is "0", the clock signal  $CLK1'$  is clocked.

In the third embodiment as illustrated in Fig. 8, if the logic gate combination circuit  $C_1$  forms an arithmetic and logic unit (ALU), the logic gate combination circuit  $C_1'$  forms a barrel shifter, and the logic

gate combination circuit  $C_2$  forms a data cache memory, the pipeline processing apparatus of Fig. 8 can serve as a microprocessor.

Thus, in the third embodiment as illustrated in Fig. 8, even when there is a logic gate combination circuit which is hardly operated, a surplus power consumption therefor can be reduced.

5 In Fig. 8, a stage other than the first stage can be double-staged. Also, a multi-stage greater than a double-stage can be adopted instead of the double-stage.

As explained hereinbefore, according to the present invention, in a pipeline processing apparatus, since power consumption dissipated by a clock signal can be reduced, an overall power consumption dissipated in the pipeline processing apparatus can be reduced.

10

## Claims

1. A pipeline processing apparatus comprising:  
a plurality of serially-connected stages ( $ST_1, ST_2, \dots$ ); and  
15 a clock signal generating means, connected to said stages, for generating a plurality of clock signals ( $CLK_1, CLK_2, \dots$ ) and transmitting them to said stages individually, each of the clock signals being for operating one of said stages.
2. An apparatus as set forth in claim 1, wherein said clock signal generating means receives a stall signal (STL) for stopping an operation of said apparatus to stop generation of said clock signals individually.  
20
3. An apparatus as set forth in claim 1, wherein said clock generating means comprises:  
a plurality of stall signal generating means for generating a plurality of stall signals (STL1, STL2,  $\dots$ ); and  
25 a plurality of gate circuits ( $G_1, G_2, \dots$ ), each connected to one of said stall signal generating means, each of said gate circuits receiving a common clock signal (CLK) and one of said stall signals and passing the common clock signal therethrough in accordance with one of said stall signals.
4. An apparatus as set forth in claim 3, wherein said plurality of stall signal generating means comprise a plurality of serially-connected delay circuits for receiving a main stall signal (STL) and delaying it to generate the plurality of stall signals.  
30
5. An apparatus as set forth in claim 3, wherein said plurality of stall signal generating means comprises a plurality of serially-connected flip-flops ( $FF_{10}, FF_{20}, \dots$ ) clocked by the common clock signal to generate the plurality of stall signals having a delay time period therebetween determined by the main clock signal.  
35
6. An apparatus as set forth in claim 1, wherein said clock signal generating means comprises:  
means for generating a plurality of stall signals (STL1, SYTL2,  $\dots$ ) having delay time periods in response to operations of said stages; and  
40 means for carrying out logic operations between the stall signals and a common clock signal (CLK) to generate the clock signals in accordance with results of the logic operations.
7. An apparatus as set forth in claim 2, wherein said stages are of a dynamic type,  
45 said apparatus further comprising means for receiving a refresh signal (REF) to stop generation of the stall signal.
8. An apparatus as set forth in claim 3, wherein said stages are of a dynamic type,  
said apparatus further comprising means for receiving a refresh signal (REF) to stop generation of the stall signals.  
50
9. An apparatus as set forth in claim 1, wherein each of said stages comprises:  
a plurality of flip-flops ( $FF_{11}, FF_{12}, \dots, FF_{21}, FF_{22}, \dots, FF_{31}, FF_{32}, \dots$ ), each clocked by one of the clock signals; and  
55 a logic gate combination circuit ( $C_1, C_2, \dots$ ) connected to outputs of said flip-flops.
10. An apparatus as set forth in claim 9, wherein said logic gate combination circuit is of a dynamic type clocked by one of the clock signals.

11. An apparatus as set forth in claim 1, wherein at least one of said stages includes a plurality of parallelly-connected sub stages (ST<sub>1</sub>, ST<sub>1</sub>'),  
said apparatus further comprising a decoding means (G<sub>1</sub>, G<sub>1</sub>') for receiving a decoding signal (DEC) to select one of said sub stages.
- 5 12. A pipeline processing apparatus comprising:
  - a plurality of serially-connected stages (ST<sub>1</sub>, ST<sub>2</sub>, ...), each including a plurality of first flip-flops (FF<sub>11</sub>, FF<sub>12</sub>, ..., FF<sub>21</sub>, FF<sub>22</sub>, ..., FF<sub>31</sub>, FF<sub>32</sub>, ...), each clocked by one of the clock signals; and a logic gate combination circuit (C<sub>1</sub>, C<sub>2</sub>, ...) connected to outputs of said flip-flops;
  - 10 a plurality of serially-connected second flip-flops (FF<sub>11</sub>, FF<sub>20</sub>, ...) for delaying a main stall signal (STL1) to generate a plurality of stall signals (STL2, STL3, ...) having a delay time (ΔT) therebetween;
  - 15 a gate means (G<sub>1</sub>) for passing a common clock signal (CLK) therethrough in accordance with the main clock signal and transmitting a passed common clock signal to said first flip-flops of a first one of said stages; and
  - a plurality of gate means (G<sub>2</sub>, G<sub>3</sub>, ...), each connected to one of said second flip-flops, each for passing the common clock signal therethrough in accordance with one of the stall signals and transmitting a passed common clock signal to said first flip-flops of one of said stages after the first one.
- 20 13. An apparatus as set forth in claim 12, wherein said first flip-flops of said stages are of a dynamic type,  
said apparatus further comprising means for receiving a refresh signal (REF) to stop generation of the main stall signal.
- 25 14. An apparatus as set forth in claim 13, further comprising selector means (SEL<sub>1</sub>', SEL<sub>2</sub>', ... ) for feeding back output signals of said first flip-flops of the first stage in response to the refresh signal.
- 30 15. An apparatus as set forth in claim 13, wherein said logic gate combination circuit is of a dynamic type,  
said logic gate combination circuit being connected to one of said gate means to receive the passed main clock signal, so that said logic gate combination circuit carries out a precharging operation and a logic operation alternatively.
16. An apparatus as set forth in claim 12, wherein at least one of said stages includes a plurality of parallelly-connected sub stages (ST<sub>1</sub>, ST<sub>1</sub>'),  
said apparatus further comprising a decoding means (G<sub>1</sub>, G<sub>1</sub>') for receiving a decoding signal (DEC) to select one of said sub stages.

40

45

50

55

Fig. 1 PRIOR ART



PRIOR ART



Fig. 2A

Fig. 2B

Fig. 2C

Fig. 2D

Fig. 2E

Fig. 2F

*Fig. 3A* PRIOR ART



*Fig. 3B* PRIOR ART



Fig. 4





Fig. 6





Fig. 8





European Patent  
Office

EUROPEAN SEARCH REPORT

Application Number  
EP 94 11 2140

| DOCUMENTS CONSIDERED TO BE RELEVANT                                              |                                                                                                                                                                                              |                   |                                              |  |  |
|----------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------|----------------------------------------------|--|--|
| Category                                                                         | Citation of document with indication, where appropriate, of relevant passages                                                                                                                | Relevant to claim | CLASSIFICATION OF THE APPLICATION (Int.Cl.6) |  |  |
| X                                                                                | P. M. KOGGE 'The Architecture of Pipelined Computers'<br>1981, McGRAW-HILL, NEW YORK, US.<br>---                                                                                             | 1-4, 6, 9,<br>12  | G06F1/32<br>G06F9/38                         |  |  |
| A                                                                                | PROCEEDINGS OF THE FALL JOINT COMPUTER CONFERENCE 1965<br>pages 489 - 504<br>L. W. COTTEN 'Circuit Implementation of High-Speed Pipeline Systems'<br>* the whole document *                  | 1, 5, 12          |                                              |  |  |
| A                                                                                | PATENT ABSTRACTS OF JAPAN<br>vol. 17, no. 522 (P-1616) 20 September 1993<br>& JP-A-05 135 592 (NEC CORP.) 1 June 1993<br>* abstract *                                                        | 11, 16            |                                              |  |  |
| A                                                                                | PATENT ABSTRACTS OF JAPAN<br>vol. 9, no. 243 (P-392) 30 September 1985<br>& JP-A-60 095 643 (FUJITSU K. K.) 29 May 1985<br>* abstract *                                                      | 11, 16            |                                              |  |  |
| A                                                                                | 1992 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS 10 May 1992, CA US<br>pages 208 - 211<br>D. H. K. HOE ET AL. 'Pipelining of GaAs Dynamic Logic Circuits'<br>* the whole document * | 7, 8, 13          | TECHNICAL FIELDS<br>SEARCHED (Int.Cl.6)      |  |  |
| A                                                                                | DE-A-28 25 770 (LICENTIA PATENT-VERWALTUNGS-GMBH) 3 January 1980<br>-----                                                                                                                    |                   | G06F                                         |  |  |
| The present search report has been drawn up for all claims                       |                                                                                                                                                                                              |                   |                                              |  |  |
| Place of search                                                                  | Date of compilation of the search                                                                                                                                                            | Examiner          |                                              |  |  |
| THE HAGUE                                                                        | 11 November 1994                                                                                                                                                                             | Daskalakis, T     |                                              |  |  |
| CATEGORY OF CITED DOCUMENTS                                                      |                                                                                                                                                                                              |                   |                                              |  |  |
| X : particularly relevant if taken alone                                         | T : theory or principle underlying the invention                                                                                                                                             |                   |                                              |  |  |
| Y : particularly relevant if combined with another document of the same category | E : earlier patent document, but published on, or after the filing date                                                                                                                      |                   |                                              |  |  |
| A : technological background                                                     | D : document cited in the application                                                                                                                                                        |                   |                                              |  |  |
| O : non-written disclosure                                                       | L : document cited for other reasons                                                                                                                                                         |                   |                                              |  |  |
| P : intermediate document                                                        | A : member of the same patent family, corresponding document                                                                                                                                 |                   |                                              |  |  |

**This Page is Inserted by IFW Indexing and Scanning  
Operations and is not part of the Official Record**

## **BEST AVAILABLE IMAGES**

Defective images within this document are accurate representations of the original documents submitted by the applicant.

Defects in the images include but are not limited to the items checked:

- BLACK BORDERS**
- IMAGE CUT OFF AT TOP, BOTTOM OR SIDES**
- FADED TEXT OR DRAWING**
- BLURRED OR ILLEGIBLE TEXT OR DRAWING**
- SKEWED/SLANTED IMAGES**
- COLOR OR BLACK AND WHITE PHOTOGRAPHS**
- GRAY SCALE DOCUMENTS**
- LINES OR MARKS ON ORIGINAL DOCUMENT**
- REFERENCE(S) OR EXHIBIT(S) SUBMITTED ARE POOR QUALITY**
- OTHER: \_\_\_\_\_**

**IMAGES ARE BEST AVAILABLE COPY.**

**As rescanning these documents will not correct the image problems checked, please do not report these problems to the IFW Image Problem Mailbox.**